Natural Language Processing, Corpus Linguistics, Corpus Based Grammar Research
نویسندگان
چکیده
منابع مشابه
Corpus linguistics meets language technology:
To the extent that NLP is used by QA systems, it is mostly limited to tokenization, named entity recognition, stemming, POS tagging, and shallow parsing. More sophisticated NLP such as (deep) syntactic parsing is hardly ever used. In the present paper I investigate why this should be the case and try to establish how deep syntactic parsing as developed in the field of corpus linguistics might c...
متن کاملWeb Text Corpus for Natural Language Processing
Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpus by downloading web pages to create a topic-diverse collection of 10 billion words of English. We show that for context-sensitive spelling correction the Web Corpus results are better than using a search engine. For t...
متن کاملCorpus Design For Biomedical Natural Language Processing
This paper classifies six publicly available biomedical corpora according to various corpus design features and characteristics. We then present usage data for the six corpora. We show that corpora that are carefully annotated with respect to structural and linguistic characteristics and that are distributed in standard formats are more widely used than corpora that are not. These findings have...
متن کاملGrammar-based Corpus Annotation
There is an increasing number of linguists interested in large syntactically annotated corpora (treebanks). Such corpora can serve as a base for statistical applications and, at the same time, may be used in theoretical linguistics as a source for investigations about language use. The most important treebank nowadays is the Penn Treebank (Marcus et al., 1993; Marcus et al., 1994). Many statist...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Jazykovedný Casopis
سال: 2010
ISSN: 0021-5597
DOI: 10.2478/v10113-009-0019-6